NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reconciling Hardware Transactional Memory and Persistent Programming with Buffered Durability

https://doi.org/10.1145/3694906.3743321

Du, Mingzhe; Su, Ziheng; Scott, Michael L (July 2025, ACM)

Hardware Transactional Memory (HTM) simplifies concurrent programming and can accelerate multithreaded execution through lock elision. Non-Volatile Memory (NVM) combines the speed and byte addressability of DRAM with the durability of storage, enabling the construction of high-performance, persistent data structures. Unfortunately, the write-back instructions typically needed to ensure post-crash consistency in NVM cause HTM transactions to abort, precluding the straightforward combination of HTM and persistent data structures. The problem goes away on machines with persistent caches, but these require special battery-backed circuitry and are far from commonplace.To combine HTM and persistent data structures, we advocate for buffered durable linearizability (BDL), a relaxed correctness criterion that enables recovery to a "recent" consistent state in the wake of a crash, allowing writes-back to occur outside transactions.Significantly, BDL retains the persistence guarantees of storage systems—such as databases backed by disks or flash—that have relied on buffering for decades.The combination of HTM and buffered durability enables three separate usage scenarios. First, we add durability to an existing HTM-based structure (a van Emde Boas tree due to Khalaji et al.); second, we use HTM to simplify an existing persistent structure (a skiplist due to Wang et al.); third, we "back port" an HTM-based structure optimized for persistent caches (a hash table due to Zhang et al.) to work well on more conventional processors. The first two scenarios yield several-fold improvements in throughput; the third sees very little slowdown.
more » « less
Free, publicly-accessible full text available July 16, 2026
Trigonal Bipyramidal or Square Planar? Density Functional Theory calculations of iron bis(dithiolene) N-heterocyclic carbene complexes

https://doi.org/10.1039/D4DT02650K

Merkel, Katherine; Santos, Alyssa_V B; Simpson, Scott Michael (January 2024, Dalton Transactions)

Density functional theory (DFT) calculations of 57 iron bis(dithiolene)-N-heterocyclic carbene adducts were conducted to determine what parameters predict, and possibly influence, the coordination of these aforementioned adducts. The parameters considered...
more » « less
Full Text Available
Accelerating Finite-Element Structural Elastic Dynamic Analysis Using GPU Computing

Araújo, Gustavo; Simpson, Barbara; Zhu, Minjie; Scott, Michael (January 2024, 18th World Conference on Earthquake Engineering (18WCEE))

The demand for high-performance computing resources has led to a paradigm shift towards massive parallelism using graphics processing units (GPUs) in many scientific disciplines, including machine learning, robotics, quantum chemistry, molecular dynamics, and computational fluid dynamics. In earthquake engineering, artificial intelligence and data-driven methods have gained increasing attention for leveraging GPU-computing for seismic analysis and evaluation for structures and regions. However, in finite-element analysis (FEA) applications for civil structures, the progress in GPU-accelerated simulations has been slower due to the unique challenges of porting structural dynamic analysis to the GPU, including the reliance on different element formulations, nonlinearities, coupled equations of motion, implicit integration schemes, and direct solvers. This research discusses these challenges and potential solutions to fully accelerate the dynamic analysis of civil structural problems. To demonstrate the feasibility of a fully GPU-accelerated FEA framework, a pilot GPU-based program was built for linear-elastic dynamic analyses. In the proposed implementation, the assembly, solver, and response update tasks of FEA were ported to the GPU, while the central-processing unit (CPU) instructed the GPU on how to perform the corresponding computations and off-loaded the simulated response upon completion of the analysis. Since GPU computing is massively parallel, the GPU platform can operate simultaneously on each node and element in the model at once. As a result, finer mesh discretization in FEA will not significantly increase run time on the GPU for the assembly and response update stages. Work remains to refine the program for nonlinear dynamic analysis.
more » « less
Full Text Available
EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High-Density Digital Storage

https://doi.org/10.1109/JSSC.2024.3387566

Levy, Akash; Upton, Luke R; Scott, Michael D; Rich, Dennis; Khwa, Win-San; Chih, Yu-Der; Chang, Meng-Fan; Mitra, Subhasish; Murmann, Boris; Raina, Priyanka (July 2024, IEEE Journal of Solid-State Circuits)

Full Text Available
Transactional Composition of Nonblocking Data Structures

https://doi.org/10.1145/3558481.3591079

Cai, Wentao; Wen, Haosen; Scott, Michael L. (June 2023, 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

This paper introduces nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations). Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2x. For persistent data structures, we observe that failure atomicity for transactions can be achieved "almost for free'' with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
more » « less
Full Text Available
Transactional Composition of Nonblocking Data Structures

https://doi.org/10.1145/3572848.3577503

Cai, Wentao; Wen, Haosen; Scott, Michael L. (February 2023, 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP))

We introduce nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations). Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their nonblocking liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2×. For persistent memory, we observe that failure atomicity for transactions can be achieved "almost for free" with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
more » « less
Full Text Available
Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural Systems

https://doi.org/10.1061/JSENDH.STENG-11311

Simpson, Barbara G.; Zhu, Minjie; Seki, Akiri; Scott, Michael (March 2023, Journal of Structural Engineering)

Full Text Available
EMBER: A 100 MHz, 0.86 mm2, Multiple-Bits-per-Cell RRAM Macro in 40 nm CMOS with Compact Peripherals and 1.0 pJ/bit Read Circuitry

https://doi.org/10.1109/ESSCIRC59616.2023.10268807

Upton, Luke R.; Levy, Akash; Scott, Michael D.; Rich, Dennis; Khwa, Win-San; Chih, Yu-Der; Chang, Meng-Fan; Mitra, Subhasish; Raina, Priyanka; Murmann, Boris (September 2023, ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC))
How Should We Think about Persistent Data Structures?

https://doi.org/10.1145/3519270.3538455

Scott, Michael L. (July 2022, 41st ACM Symposium on Principles of Distributed Computing (PODC))

Salerno, Italy
more » « less
Full Text Available
Fast Nonblocking Persistence for Concurrent Data Structures (extended abstract)

Cai, Wentao; Wen, Haosen; Maksimovski, Vladimir; Du, Mingzhe; Sanna, Rafaello; Abdallah, Shreif; Scott, Michael L. (May 2022, 13th Annual Non-Volatile Memories Workshop)

San Diego, CA
more » « less
Full Text Available

« Prev Next »

Search for: All records